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Abstract 

Driessel [Computing canonical forms using flows, Linear Algebra and Its Applications 2004] 
introduced the notion of quasi-projection onto the range of a linear transformation from 
one inner product space into another inner product space. Here we introduce the notion of 
quasi-projection onto the intersection of the ranges of two linear transformations from two 
inner product spaces into a third inner product space. As an application, we design a new 
family of iso-spectral flows on the space of symmetric matrices that preserves zero patterns. 
We discuss the equilibrium points of these flows. We conjecture that these flows generically 
converge to diagonal matrices. We perform some numerical experiments with these flows 
which support this conjecture. We also compare our zero preserving flows with the Toda 
flow. 
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1 Introduction. 



Let A be a set of pairs of integers between 1 and n which satisfies the following 

conditions: (1) for i = 1,2, n, the diagonal pair (i, i) is in A, and (2) if the pair is in 
A then so is the symmetric pair We regard A as a (symmetric) sparsity pattern of 

interest of nonzero entries for matrices. In particular, let Sym(n) denote the vector space 
of symmetric, nxn, real matrices and let Sym(A) denote the subspace of Sym(n) consisting 
of the symmetric matrices which are zero outside the pattern A; in symbols, 

Sym(A) := {X G Sym(n) : X(i,j) ^ implies G A}. 

In this report we consider the following task: Find flows in the space Sym(A) which 
preserve eigenvalues and converge to diagonal matrices. We can describe this task more 
precisely as follows: With an n x n symmetric matrix A, we associate the iso-spectral 
surface, Iso(A), of all symmetric matrices which have the same eigenvalues as A. By the 
spectral theorem, we have 

Iso(A) := {QAQ T : Q G 0(n)} 

where 0{n) denotes the group of orthogonal matrices. 

We shall use the Frobenius inner product on matrices; recall that it is defined by {X, Y) : = 
Trace(XY T ) . With a symmetric matrix D, we associate a real- valued 'objective' function 

/ := Sym{n) -> R : X h-> (1/2) (X -D,X- D). 

Note that / is a measure of the distance from X to D. We shall consider the following 
constrained optimization problem: 

Problem 1 Given A G Sym(A), minimize f(X) subject to the constraints X G Iso(A) and 
X G Sym(A). 

In particular, we shall describe a flow on the surface Iso(A) n Sym(A) which solves this 
problem in the sense that it usually converges to a local minimum. 
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Here is a summary of the contents of this report. 

In the next section which is entitled "Quasi-projection onto the intersection of two sub- 
spaces", we present some theoretical background material. Driessel[2004] introduced the 
notion of quasi-projection onto the range of a linear transformation from one inner prod- 
uct space to another. In this section we introduce the notion of quasi-projection onto the 
intersection of the ranges of two linear transformations A and B from two inner product 
spaces into a third inner product space. We use the notation \(A,B) to denote our quasi- 
projection operator. We show that \(A, B) = 2A(A + B) + B where the superscript + denotes 
the Moore-Penrose pseudo inverse operation. 
Remark: If A and B are invertible then 

\{A, B) = 2A{A + B)- l B = 2(A~ 1 + B' 1 )' 1 . 

This operator is called the "harmonic mean" of the operators A and B. See, for example, 
Kubo and Ando [1980]. They use the the infix notation A\B to denote the harmonic mean 
of A and B where A and B are positive operators on a Hilbert space. After we wrote this 
paper in 2001, Chandler Davis told us about this paper by Kubo and Ando. This paper 
led us to the following papers: Anderson and Duffm[1969], Anderson[1971], Anderson and 
Schreiber[1972], Anderson and Trapp[1975]. In particular, Anderson and Duffin define the 
"parallel sum" of semi-definite matrices A and B by the formula A(A + B) + B and denote 
it by A : B. We discovered that most of the results in Section 2 appear scattered in these 
earlier papers (but usually with different proofs). In order to keep this paper somewhat 
self-contained we retained our proofs. 

In the third section which is entitled "An iso-spectral flow which preserves zeros", we 
describe an application of the quasi-projection method. In particular, we describe how we 
used this method to design a new flow corresponding to the optimization problem described 
above. We conjecture that this flow generically converges to a symmetric matrix E that 
commutes with D. Note that if we choose D to be a diagonal matrix with distinct diagonal 
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entries then E commutes with D iff E is a diagonal matrix. (For background material on 
differential equations see, for example, Hirsch, Smale and Devaney [2004].) 

In the fourth section which is entitled "Numerical results" , we describe our implementa- 
tion of our iso-spectral zero-preserving flow in Matlab. We also describe several numerical 
experiments that we performed using this computer program. 

In all our experiments this flow converges (sometimes slowly) to a diagonal matrix. Con- 
sequently these experiments provide evidence for the conjecture described above. We do not 
claim our program to be competitive with standard methods used to compute eigenvalues. 
But we hope our ideas will lead eventually to practical, competitive methods for finding 
eigenvalues of some classes of structured matrices. 

In a first appendix which is entitled "Comparing projections and quasi-projections", we 
describe the origin of the quasi-projection method. In particular, we review a standard 
method of projection onto the intersection of the ranges of two linear maps. We show how 
quasi-projection arises by simplifying this standard projection procedure. We also argue that 
quasi-projection is simpler, more direct and more robust than projection. 

In a second appendix which is entitled "On the Toda flow", we indicate our current 
geometrical view of the so-called Toda flow or QR flow. Most of the results in this appendix 
are known. We present these results to show the analogies between the iso-spectral Toda 
flow and our iso-spectral, zero-preserving flow. These analogies provided the basis for our 
development of these new flows. (We have repeated some of the definitions of notation in 
this appendix. We want to make this appendix self-contained. We hope that a reader can 
understand it without knowledge of the rest of this report.) 

For another example of a structured iso-spectral flow see Fasino [2001]. 
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2 Quasi-projection onto the intersection of two sub- 



spaces. 

In this section we shall present some theoretical background material concerning quasi- 
projections. We shall apply this material in the next section. Let V be a finite-dimensional, 
real inner product space. We use (x, y) to denote the inner product of two elements of V. 
Let A : V — > V and B : V — > V be (self-adjoint) positive semi-definite linear operators on 
V. For any vector c in V, consider the following system of linear equations for u and A in V: 

u - AX = Ac (ql) 
(A + B)X = (B- A)c (q2) 

We call these equations the quasi-projection equations determined by A, B and c. 
Remark: In this section we usually assume that A and B are two positive semi-definite 
operators on a finite dimensional space. These assumptions simplify the analysis consider- 
ably. They will be obviously satisfied in the application considered below. However, many 
of the results in this section are true in more general settings. 



Note that (ql) is equivalent to the following condition: 

u = A(X + c). (eql) 

Hence u is in the range of A. Also note that (q2) is equivalent to the following condition: 

A(X + c) = B(-X + c). (eq2) 

Hence u is also in the range of B. Thus we see that u is in the intersection of the range of 
A and the range of B. 

Remark: We sometimes use f.x or fx in place of f(x) to indicate function application. We 
do so to reduce the number of parentheses. We also use association to the left. For example, 
D{uj.A).I .K means evaluate oj at A to get a function, differentiate this function, evaluate 
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the result at / to get a linear function, and finally evaluate at K. We adapted this notation 
from the programming language C (in which such a dot notation is used in connection with 
data structures). 

We shall use the following lemma repeatedly. 

Lemma 1 If A and B are positive semi- definite operators then 

Kernel(A + B) = Kernel. A n Kernel. B , 
Range(A + B) = Range. A + Range. B. 

Proof: If Az = Bz = then (A + B)z = 0. Now assume (A + B)z = 0. Then = 
{z,{A + B)z) = (z,Az) + (z,Bz). Since A and B are positive semi-definite, we get = 
(z, Az) = (z, Bz) and hence = Az = Bz. The second equation of this lemma is obtained 
from the first one by taking orthogonal complements. □ 

The following proposition shows that the vector u is uniquely determined by the quasi- 
projection equations. 

Proposition 1 (Uniqueness) Let A and B be positive semi-definite operators. For any 
c G V , if (wi,Ai) and {u2,\i) (ire solutions of the quasi-projection equations (ql) and (q2) 
then U\ = u 2 , A\ 1 = A\ 2 and B\i = B\ 2 . 

Proof: Let u := U\ — u 2 and A := Ai — A2. Then we have u — AX = and (A + B)X = 0. 
By Lemma 1 we get A\ = BX = 0. Then u = AX = 0. □ 

The following proposition shows that solutions of the quasi-projection equations always 
exist. 

Proposition 2 (Existence) Let A and B be positive semi-definite operators. For all c & V , 
there exist u and X in V satisfying the quasi-projection equations (ql) and (q2). 
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Proof: It clearly suffices to show that there is a A in V such that (A + B)X — (B — A)c. 
In other words, we need to see that (B — A)c G Range(A + B) — Range. A + Range. B. For 
this we simply note (B — A)c = A{— c) + Be G Range. A + Range. B. □ 

Let ! (A, B) : V — > V denote the linear operator on V which maps a vector c to the unique 
vector u which satisfies the following condition: There exists A G V, such that the pair (u, A) 
satisfies the quasi-projection equations (ql) and (q2). We call the vector u = \(A,B).c the 
quasi-projection of c onto the intersection of Range. A and Range. B. Following Anderson 
and Duffin [1969] we call \(A, B) the parallel sum of A and B (even though there is a 
difference of a factor of 2). 

For any linear map M between inner product spaces let M* denote the adjoint map 
which is defined by the following condition: for all x in the domain of M and all y in the 
codomain of M, (Mx,y) = (x,M*y). (Halmos [1958] uses this notation for the adjoint.) 
The following proposition shows how quasi-projection behaves with respect to congruence. 

Proposition 3 (Congruence) Let M : V — > V be any invertible linear map. Then 
M(\(A,B))M* = \(MAM*, MBM*). 

Proof: The pair of equations (eql) and (eq2) is equivalent to the following pair: 

Mu = MAM*(M*)-\c + A), 
MAM*(M*)-\c + A) = MBM*(M*y 1 (c - A). 

Hence, for all c in V, we have 

M(\(A, B))c AM' . MBM' ){ M' ) v. 

□ 
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Let U and V be inner product spaces and let L : U — > V be a linear map. We use 
L+ to denote the Moore-Penrose pseudo-inverse of L. (See, for example, Lawson and Han- 
son [1974].) We list the following properties of the pseudo-inverse 

L*+ = L + \ LL+L = L, L + LL + = L + , 

and note that LL + is the projection of V onto Range. L and L + L is the projection of U onto 
Range. L* . 

Lemma 2 Let A and B be positive semi- definite operators on an inner product space V . 
Then 

A = A(A + B)(A + B) + = A(A + B) + {A + B) = (A + B)(A + B) + A ={A + B) + (A + B)A. 

Proof: Note that P := (A + B){A + B) + = (A + B) + (A + B) is the projection of V onto 
the range of A + B. In particular, by Lemma 1, for all x in the range of A, we have Px = x. 
Also note that V = Range. A © Kernel. A since (Range. A) 1 - = Kernel. A* = Kernel. A. 

Now consider any x E V. Note that P Ax = Ax because Ax is in the range of A which is 
a subset of the range of A + B. Hence A = PA. Since A is self-adjoint we also have A = AP. 
□ 

The following proposition is our main result concerning quasi-projections. We shall use 
it below to design zero preserving flows. 

Proposition 4 (Quasi-Projection Formulas) Let A and B be positive semi-definite op- 
erators. Then the quasi-projection operator is given by the following formulas: 

\(A, B) = 2A(A + B) + B = 2B(A + B) + A. 

Furthermore, the quasi-projection operator is positive semi- definite. Its range equals the 
intersection of the range of A and the range of B and its kernel equals the sum of the kernel 
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of A and the kernel of B; in symbols, 

Range(\(A, B)) = Range. A n Range. B 
Kernel(\(A, B)) = Kernel. A + Kernel. B. 

Proof: 

Claim: \(A, B) = 2A(A + B)+B . 

We take A := (A + B) + (B — A)c. This A satisfies the quasi-projection equation (q2). 
Substituting in equation (ql), we get u = \(A,B)c = (A(A + B) + (B — A) + A)c. Using 
Lemma 2, we get A(A + B) + (B - A) + A = 2A(A + B) + B . 

Claim: A(A + B) + B = A- A(A + B)+ A . 

Using Lemma 2 again yields A - A(A + B)+A = A(A + B) + (A + B) - A(A + B) + A = 
A(A + B)+B. 

Claim: The map A(A + B) + B is self-adjoint. 

Use the previous claim and the fact that (A + B)+ = (A + B)* + = (A + B) + * . 

Claim: A(A + B) + B = B(A + B)+ A . 

Use the previous claim and (A(A + B) + B)* = B(A + B) + A. 

Claim: Kernel(\(A, B)) = Kernel. A + Kernel. B . 

By the formulas for the quasi-projection, we see that its kernel contains Kernel. A and 
Kernel. B and hence Kernel. A + Kernel. B. We need to prove the other inclusion; in other 
words, we want to see that the following statement is true: 

Vz G Kernel. (\(A,B)),3x,y e V, z = x + y, Ax = 0, By = 0. 

Consider any z satisfying = \(A,B)z = 2A(A + B) + Bz. Take x := (A + B) + Bz. Note 
Ax = 0. Using Lemma 2 again we also have Bx = (A + _B)x = (A + -B)(A + B) + Bz = Bz. 
Hence B(z — x) = 0. We can take y := z — x. 
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Claim: Range(\(A, B)) = Range. A n Range. B 

Take orthogonal complements of the previous claim. 

Claim: The map \{A,B) is positive semi-definite. 

Note that the range of A + B is an invariant subspace of \(A, B). Clearly we only need 
to see that the restriction of \(A,B) to this range is positive semi-definite. Consequently 
we assume that V = Range(A + B). In this case we have \{A,B) = 2A(A + B)^B = 
2B(A + B)~ 1 A. We now view A and B as matrices. Since A + B is positive definite and 
A is self-adjoint, we can simultaneously diagonalize these two matrices by a congruence. 
(See, for example, Bellman [1970].) In particular, there is an invertible matrix M and a 
diagonal matrix D := diag(aj, . . . , a 2 n ) such that M(A + B)M* = I and MAM* = D. 
We see from these equations that E := MBM* is also a diagonal matrix; in particular, 
E = diag(b\, . . . , b^) where the b\ are defined by a} + b\ := 1. Now we have (by the formula 
for the quasi-projection operator): 

M(\(A, B))M* = 2MAM*(M(A + B)M*)~ 1 MBM* 
= 2diag(albl,...,a 2 n b 2 n ). 

Thus M(\(A, B))M* is positive semi-definite and hence \(A,B) is positive semi-definite. 
□ 



3 An iso-spectral flow which preserves zeros. 

As above, let AC {1, 2, . . . , n} x {1, 2, . . . , n} be a set of pairs of indices which satisfy 
the following conditions for all i,j = 1,2, ... ,n: 

(M)6A, (nzl) 
€ A implies G A. (nz2) 
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Recall that we are using Sym(n) to denote the vector space of symmetric nxn matrices and 
we are using Sym(A) to denote the subspace of Sym(n) consisting of the symmetric matrices 
which are zero outside of A. The set A of pairs of indices represents the nonzero pattern 
of interest. The first condition on A implies that the diagonal matrices are a subspace of 
Sym(A). The second condition simply says that the pattern A is symmetric. We want to 
consider some iso-spectral flows on Sym(A). 

We use [X, Y] := XY — YX to denote the commutator of two square matrices. Note 
that if X is symmetric and K is skew-symmetric then [X, K] is symmetric. Furthermore, we 
use 0{n) to denote the orthogonal group. For a symmetric matrix X, let 

uj.X := 0(n) -> Sym(n) : Q i-> QXQ T . 

Then the image of uj.X is the iso-spectral surface, Iso(X), determined by X. We can regard 
uj.X as a map from one manifold to another. In particular we can differentiate this map at 
the identity I to obtain the following linear map: 

D(uj.X).I = Tan.O(n).I -> Tan.Sym(n).X : K h-> [K,X\. 

The space tangent to 0{n) at the identity I may be identified with the skew-symmetric 
matrices; in symbols, 

Tan.O(n).I = Skew(n) := {K G R nxn : K T = -K}. 

(See, for example, Curtis [1984].) Clearly we can also identify Tan.Sym(n).X with Sym(n). 
Hence, we define a map l.X as a linear map from Skew(n) to Sym(n) by 

l.X := D(u.X).I = Skewin) -> Sym{n) : K ^ [K,X\. 

It is not hard to prove that the space tangent to Iso(X) at X is the image of the linear map 
D(u.X).I; in symbols, 

Tan.Iso(X).X = {[K,X] : K G Skew(n)}. 

(For details see Warner [1983] chapter 3: Lie groups, section: homogeneous manifolds.) 
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Remark: Note that if X has distinct eigenvalues then (by the spectral theorem) the map 
l.X is injective. However, if some of the eigenvalues of X are repeated then l.X is not 
injective. This is one of the reasons that we prefer to use quasi-projection rather than 
projection. See the appendix which compares projections and quasi-projections. 

Recall that we are using the Frobenius inner product on n-bj-n matrices: (X,Y) : = 



Trace{XY T ). We list a few properties of this inner product: (XY, Z) = (X,ZY T ) = 



since, for every symmetric matrix Y and every skew-symmetric matrix K, ([K, X],Y) = 
(K, [Y,X]). The composition of l.X with its adjoint is a "double bracket": 



Note that for any Y E Sym(n), we have that (l.X)(l.X)*.Y is tangent to the iso-spectral 
surface Iso(X) at X. 

We shall also use the map m : Sym(n) — > Sym(A) which is defined as follows: For any 
symmetric matrix Y, let m.Y denote the matrix defined by m.Y(i,j) := Y(i,j) if is 
in A and m.Y(i,j) := if is not in A. Note that m is the orthogonal projection of 
Sym(n) onto Sym(A). In particular, we have m = m* = m 2 . 

We want to consider vector fields on Sym(A) which are iso-spectral. We can obtain 
such vector fields by quasi-projection. Let v : Sym(n) — > Sym(n) be any smooth map on 
Sym(n). From v we can obtain an iso-spectral vector field on Sym(A) by quasi-projection as 
follows. For any symmetric matrix X, let p.X := \(A.X,B.X) be the quasi-projection map 
determined by A.X := (l.X)(l.X)* and B.X := m. Since these latter two linear maps are 
positive semi-definite, the results of the last section apply here. We shall use those results 
without explicitly citing particular propositions. In particular, note that for any symmetric 




(l.X)* = Sym(n) -> Skew(n) : Y i-> 



(l.X)(l.Xy = Sym(n) -> Sym(n) : Y i-> 



[[Y,X],X]. 
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matrix Y, the symmetric matrix p.X.Y is in the intersection of the range of (l.X)(l.X)* and 
m; in symbols, 

p.X.Y G Tan.Iso(X).X n Sym(A). 
We have the following iso-spectral vector field on Sym(A): 

Sym(A) -> Sym(A) : X h-> p.X(v.X). 

The corresponding differential equation is X' = p.X(v.X). We can rewrite this differential 
equation as a differential (linear) algebraic equation as follows: 

X'= (l.X)(l.X)*(X + v.X), 
(l.X)(l.X)*(X + v.X) = m(-A + v.X). 

Note that the second of these equations is a linear equation for the unknown symmetric ma- 
trix A. The vector field is determined by solving this second equation for A and substituting 
the solution into the first equation. 

Using the formulas for l.X and (l.X)*, we get 

(l.X)(l.X)*(X + v.X) = [[A + v.X, X],X\. 

Substituting this simplification into the differential algebraic equation, we get 

X' = [[X + v.X,X],X], 
[[A + v.X, X], X] = m(-X + v.X). 

We now turn our attention to a specific flow. This flow is determined by the optimization 
problem (Problem 1) that we mentioned in the introduction. We shall see that we can solve 
this problem by finding a vector field on Sym(A) associated with the objective function / 
which is iso-spectral. We obtain X — D for the gradient of / at X, in symbols V ' f.X = X — D. 
We can get an iso-spectral vector field by orthogonal projection of ^7 f.X onto the intersection 
Tan.Iso(X).X n Sym(A). We prefer to quasi-project instead. (We explain this preference 
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in an appendix.) We simply substitute the negative of the gradient into the formulas given 
above. We get the following system: 

X' = [[\ + D,X],X], (del) 
[[\ + D,X],X] = -m(\ + X-D). (de2) 

We call the flow generated by this system the quasi-projected gradient flow determined 
by the objective function /. We summarize the properties of this flow in the following 
proposition. 

Proposition 5 Let D be a symmetric matrix. Then the system (del) and (de2) generating 
the quasi-projected gradient flow has the following properties: 

(i) The quasi-projected gradient flow preserves eigenvalues and the nonzero pattern of in- 
terest. 

(ii) The function f(X) := (1/2) (X — D,X — D) is non-increasing along solutions of this 
system. 

(Hi) A point E e Sym(A) is an equilibrium point of this system iff it satisfies the conditions 

[A + D, E] = 0, and (el) 
m(A + E - D) = 0. (e2) 

for some symmetric matrix A. 

(iv) If a matrix E e Sym(A) commutes with D then E is an equilibrium point of this 
system. 

Proof: (i) That this flow preserves eigenvalues and the nonzero pattern of interest is clear 
from the discussion above. The vector field was chosen to have these properties. In particular, 
the vector field preserves the nonzero pattern because X' = — m(X + X — D) has the nonzero 
pattern of interest. Also the vector field preserves eigenvalues because X' = [[A + D,X],X] 
is tangent to the iso-spectral surface Iso(X) at X. 
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(ii) Let X(t) be any solution of the differential equation. Then, since the quasi-projection 
operator p.X = \((l.X)(l.X)*,m) is positive semi-definite, we have 

(/(X))' = (Vf.X,X') = (Vf.X,p.X(-Vf.X)) 
= -(Vf.X,p.X(Vf.X)}<0. 

(iii) Let E e Sym(A) satisfy conditions (el) and (e2). Then clearly [[A + D,E],E] = 
[Q : E] = and E is an equilibrium point of the system (del, de2). On the other hand, if 
E G Sym(A) is an equilibrium point then (del) implies [[A + D, E],E] = and (de2) implies 
(e2). We then also get 

0={[[\ + D,E},E},\ + D) = {[\ + D,E},[\ + D,E}), 

which implies (el). 

(iv) Take A := D — E . Then (e2) is trivially satisfied and for (el) we have 

[A + D, E] = [2D — E,E\ — 2[D, E] = 0. 

□ 

Remark: We should say a few words about convergence of this system. (We intend to 
discuss convergence more fully in a future paper.) Note that the map uj.A is a smooth map 
from 0{n) onto Iso(A). Hence Iso(A) is compact since 0{n) is compact. From part (i) of 
the proposition, we then see that every solution starting in the iso-spectral surface Iso(A) 
remains in that surface and is entire. (In particular, "blowup" is not possible.) Again using 
compactness, we see that every such solution has cu-limit points. If the equilibrium points 
on the iso-spectral surface are isolated (which we expect is usually true) then every solution 
that starts in the iso-spectral surface tends to an equilibrium point. 

Note that if D is a diagonal matrix with distinct diagonal entries and E is a diagonal 
matrix then E commutes with D. It follows from part (iv) of the theorem that E is an equi- 
librium point of the quasi-projected gradient flow determined by D. In 2001 we conjectured 
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E :-- 



that diagonal matrices were the only equilibrium points of this flow. In 2005 Bryan Shader 
found a counterexample to that conjecture. Here is a counterexample. 

Example: Let a and b be real non-zero parameters and consider the non-diagonal, sym- 
metric matrix 

'0 a 
a b 

V° " °j 

The matrix E has the distinct eigenvalues and ±V a 2 + b 2 . 

We set A as the non-zero pattern of E. We show, by suitably defining matrices D and A, 
that E is an equilibrium point of the quasi-projected gradient flow, i.e. satisfies conditions 
(el) and (e2). 

Let y and z be real parameters and take \+D := yE+zE 2 . This clearly gives [\+D,E] = 
0, i.e. condition (el) is satisfied. Furthermore, 



A 
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m(\ + E — D) = m(\+D+E-2D) = m((y+l)E + zE 2 )-2D = (y + l)E + z-diag(E 2 )-2D , 



where 



(a 2 



E 2 = 



ab 
a 2 + b 2 
ab b 2 



(a 2 



and diag(E 2 ) : = 





a 2 + b 2 
b 2 



\ 



Now, by choosing y = — 1 and defining D :— |z • diag(E 2 ), we arrive at m(A + — £>) = 0, 
i.e. condition (e2) is satisfied. Furthermore, if z ^ and |a| 7^ |6| then D has the required 
distinct diagonal entries. 

A numerical experiment shows that the equilibrium point E with a := 1 and b := 2 and 
2: := 2 is not stable. 



We now conjecture that if D is a diagonal matrix with distinct diagonal entries then 
diagonal matrices are the only stable equilibrium points of the quasi-projected gradient flow 
determined by D. 
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Remark: A set S in a topological space T is called nowhere- dense if the interior of its 
closure is empty. A set S C T is called generic if it is open and dense. Note that if S is 
closed then it is nowhere-dense iff T\S is generic. 

Let V be a (finite-dimensional) vector space over the reals R. Let / : V — > R be a 
real-valued function on V. Note that if /(x) is a polynomial in the components of x with 
respect to some basis for V then / has this property for every choice of basis. In this case 
we say that / is a polynomial function. 

Proposition: Let / : V — > R be a polynomial function. If / is not the zero polynomial 
then the variety, Variety (f) := {x G V : f(x) = 0}, of / is nowhere-dense. 

Remark: We use the standard topology on V. 

Proof: Note the variety is closed. Suppose that the variety is not nowhere-dense. Then 
/ vanishes on an open subset of V. It follows that / is identically 0. □ 
Here is an application involving determinants. 

Example: Consider the determinant function det : R nxn — > R. The set {M G R nxn : 
det M = 0} is nowhere-dense and closed. Hence the set of non-singular n x n matrices is 
generic. 

Let V and W be vector spaces and let / : V — > W be a map. Then / is a polynomial 
map if the components fi(x), for i = 1, dim.W, with respect to some basis for W are 
polynomial functions. Note that the composition of two polynomial maps is a polynomial 
map. 

At the beginning of Section 4, we will introduce the assumption that the map (A.X+m) : 
Sym(n) — > Sym(n) with AX = (l.X)(l.X)* = [[-,X],X] is invertible for given X G Sym(n). 
Here we show that this is generic behavior if X has distinct eigenvalues. Hence, consider the 
map A.X := Sym(n) — > Lin(Sym(n) — > Sym{n)) : I h (y h [[Y, X],X]). 

Proposition. The set {X G Sym(n) : Range(A.X + m) — Sym(n)} is generic. 

Proof: Consider the polynomial function Sym(n) — > i? defined by X i— > det (AX + m). 
(Here det is regarded as a real-valued function on the space of linear maps Lin(Sym(n) — > 
Sym{n)). Note that A and m are polynomial maps.) We show below that this function is not 
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the zero function. Then we have that {X e Sym(n) : det(A.X + m) = 0} is nowhere-dense. 
Since this set is also closed we have that 

{Xe Sym{n) : Range(A.X+m) = Sym(n)} = Sym(n)\{X e Sym(n) : det(AX+m)= 0} . 

is generic. To complete the proof, we show that the function X i— > det(A.X + m) is not the 
zero function. Take X = D := diag(d± : ...d n ) where the di are distinct. Then A.D.Y(i,j) = 
(di — dj) 2 Y(i, j). Furthermore, the range of m includes the diagonal matrices. These prop- 
erties together show that Range(A.D + m) = Sym(n) and hence det(A.D + m) ^ 0. □ 





4 Numerical results. 

We have implemented the quasi-projected gradient flow in a Matlab program. This flow is 
iso-spectral and preserves zeros as discussed in the previous section. In our implementation 
we assume that Range((l.X)(l.X)* + m) — Sym(n). We solve numerically for t > the 
initial value problem for X(t) given by 

X' = g(X) := 2m ((l.X)(l.X)* + m)" 1 (l.X)(l.X)*.(D - X), X(0) = X , 

where X e Sym(A) (A is defined by the nonzero pattern of X and kept constant) and D 
is the diagonal matrix, D := diag(l, 2, . . . , n). We refer to this flow as the Zero flow in the 
discussion of the examples and in the figures below. 

The assumption on the ranges of (l.X)(l.X)* and m guarantees the existence of the 
inverse in the right-hand side of the differential equation. This assumption is not satisfied 
in general as the following example demonstrates. 

Example: Let X be the circulant matrix with —2 on the diagonal and 1 on the first sub- 
and super-diagonal (and the corresponding corner entries). The pattern A is defined as the 
nonzero pattern of X. Now let Y be any circulant matrix with nonzero pattern completely 
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outside of A, i.e. m.Y = 0. If n — 4 this is, for instance, achieved by selecting Y as the 
matrix with ones on second sub- and super-diagonal. Since circulant matrices commute with 
each other, and by the choice of the nonzero pattern of Y we have ((l.X)(l.X)* + m)Y = 0. 
Thus Y is a non-trivial element in the kernel of the map and hence the inverse does not 
exist. 



By construction of the flow, the matrix g(X(t)) E Sym(A) for all t > and when 
integrating the differential equation we ignore all matrix elements outside the pattern A 
(these remain zero for all t > 0). Therefore the dimension of our differential equation is 
reduced to the cardinality of A which may be significant less than n 2 . (We have currently 
not taken into account the symmetry of the matrices.) However, we remark that we obtain 
intermediate matrices, when evaluating the expression for g(X(t)) from the right to the left, 
which can have nonzero entries outside of A. 

For the numerical solution of the initial value problem we employ Matlab's explicit Runge- 
Kutta method of order 4(5) (rk45) with absolute and relative tolerance requirement set to 
1CT 13 . These very stringent accuracy requirements reflect the fact that we are currently 
interested in very accurate solutions to the initial value problem and not (yet) in competitive 
numerical schemes for the solution of sparse eigenvalue problems. Therefore, the cost of the 
numerical computations are not considered in the following. 

During the course of integration we monitor two characteristic quantities of the flow. 

I. The relative departure of the matrix X(t) from the iso-spectral surface associated with 
the initial matrix X . In particular, we define 

_ \\ev(X )-ev(X(t))\\ 
CV[) '~ \\ev(X )\\ 

where ev(X) is the vector of sorted eigenvalues of X. This quantity measures the 
quality of the time integration and should be approximately constant in time and near 
the machine accuracy (10~ 14 ). 
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2. The relative size of the off-diagonal elements of X(t) (with respect to X ). In particular, 
we define 

_ \\X(t)-diag(X(t))\\ F 
\\X - diag(X )\\ F ' 

where diag(X) is the matrix containing the diagonal part of X and || • ||^ is the 
Frobenius norm. This quantity measures the convergence of the flow to a diagonal 
steady state and in conjunction with a constant value of d ev (t) the convergence to the 
diagonal matrix with elements corresponding to the eigenvalues of X . 

We compare the Zero flow with the "double-bracket (DB) flow". That is, we also numer- 
ically solve the initial value problem 

X' = h(X):=[[D,X],X], X(0) = X , 

where X e Sym(n) and D is the same diagonal matrix as above. This flow is also iso-spectral 
and converges to a diagonal matrix steady state with the eigenvalues of X on the diagonal. 
(See the appendix on the Toda flow and/or Driessel [2004].) It does not preserve the zero 
pattern of the initial matrix Xo and considerable fill-in can appear. The double-bracket flow 
coincides with the Toda flow if X is a tridiagonal matrix. 

We consider three different kinds of initial data in the next three subsections. 



4.1 Example 1 

Here the initial value X is a symmetric, tridiagonal random matrix of dimension 6: 



Xo := 
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We note that the DB flow preserves the tridiagonal pattern but we do not exploit this fact 
in our implementation. 



We simulate the solution with this initial value until t = 60 for both flows. The maximum 
value of d ev (t) pa 7 • 10~ 14 for the Zero flow and ~ 2 • 10 -14 for the DB flow. This shows that 
for both flows the eigenvalues of the initial matrix are preserved up to machine accuracy in 
the numerical solution. In Figured we plot the monitored values of d ff(t) for both flows. 
We observe that both converge to zero and that this happens slightly faster for the DB flow 
initially but later the Zero flow converges faster and reaches machine accuracy before the 
DB flow. The results of this example show that for tridiagonal matrices the Toda flow is 
different than our zero flow. 
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Figure 1: Convergence history of the off-diagonal elements of the solution of Example 1 for 
the Zero and the DB flow. 
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4.2 Example 2 

In this example the initial value X is a symmetric random matrix of dimension 10 with a 
random zero pattern: 



X : = 
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We simulate the solution with this initial value until t = 60 for both flows. The maximum 
value of d ev (t) ~ 2 ■ 10™ 14 for the Zero flow and ~ 6 ■ 10~ 15 for the DB flow. This shows that 
for both flows the eigenvalues of the initial matrix are preserved up to machine accuracy in 
the numerical solution. In Figure we plot the monitored values of d ff(t) for both flows. 
We observe that both converge to zero and that this happens slightly faster for the Zero 
flow. 

4.3 Example 3 

In the third example we consider tridiagonal matrices which arise when one discretizes the 
boundary value problem u xx = 0, u(0) = u(l) = by standard second-order central differ- 
ences. Let T n := tridiag(l, —2, 1) 6 Sym(n) and T n := {n + l) 2 T n . Hence T n corresponds 
to the discretization matrix of the boundary value problem on an equidistant grid with grid 
width h := l/(n + 1). The eigenvalues of both, T n and T n , are distinct and negative. We 
present results for the four cases X = T 5 ,T W ,T 5 , and T w in Figure 01 

We run these experiments to different final times as can be seen in the plots. We note that 
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10 10 20 30 40 50 60 

Time t 

Figure 2: Convergence history of the off-diagonal elements of the solution of Example 2 for 
the Zero and the DB flow. 

the values of d ev (t) are in the range 10~ 15 to 1CT 13 for all values of t considered. Again this 
demonstrates that the numerical solution does only insignificantly drift off the iso-spectral 
surface associated with the initial matrix. Both the Zero flow and the DB flow converge 
to the diagonal matrix containing the eigenvalues of the initial data. However, whereas the 
Zero flow does so much faster than the DB flow for the matrices T n , the situation is the 
opposite for the scaled matrices T n . The change in the convergence speed of the DB flow for 
different initial matrices T n and T n is precisely explained by the following proposition. 

Proposition 6 If X(t) is the solution of the double-bracket flow with initial value X then 
cX(ct) is the solution of the double-bracket flow with initial value cX , c > 0. 

Proof: We can write h(cX) = [[D, cX], cX] = c 2 [[D, X], X] = c 2 h(X). This relation gives 
the desired scaling result. □ 



We have not analyzed how scaling affects the Zero flow. 
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Time t Time t 

Figure 3: Convergence history of the off-diagonal elements of the solutions of Example 3 
for the Zero and the DB flow for initial conditions X = T 5 (top left), X = T 5 = 36T 5 (top 
right), X = T 10 (bottom left), and X Q = T 10 = 121T 10 (top right). 

A Comparing Projections with Quasi-Projections. 

Recall the following well-known result. (See, for example, Leon [1986] Section 5.5: Least- 
squares problems or Strang [1980] Section 3.2: Projections onto subspaces and least-squares 
approximation.) 

Proposition 7 (Least squares approximation). Let L : U — > W be a linear map from 
one inner product space to another. If L is injective then, for any b e W , the "normal 
equation" L*Lx = L*b has a unique solution which is given by (L*L)~ 1 L*b. Furthermore, 
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the linear map P := L(L*L) 1 L* on W has the following properties: 

1. The range of P equals the range of L: Range. P = Range. L. 

2. The kernel of P equals the orthogonal complement of the range of L: Kernel. P = 
(Range. L)^ . 

3. The map P is the projection ofW on Range. L along (Range. L) 1 - which corresponds to 
the decomposition W = Range. L © (Range. L) L . In particular, P 2 = P and P* = P. 

The map L + = (L*L)^ 1 L* : W — > U is the Moore-Penrose pseudo-inverse of L. The map 
P = LL + is the projection map associated with the least squares problem Lx ~ b. 

The projected vector Pb is the element of the range of L which is closest to b in the least 
squares sense. 

Driessel [2004] observed the following: It is often difficult to directly use the projection 
map P. If L is not injective then the inverse of L*L does not exist. Even when L is injective 
it is often difficult to compute (L*L)~ 1 - for example, if L is ill-conditioned or the dimension 
of the vector space is large. We can often avoid these difficulties by using the linear map 
LL* : W — > W instead of the projection map P. For the map L*L we have the following 
analogue of the last proposition. 

Proposition 8 (Quasi-projection) Let L : U — > W be a linear map between two inner 
product spaces. Then the map LL* :W-^W has the following properties: 

1. The range of LL* equals the range of L: Range(LL*) = Range. L. 

2. The kernel of LL* equals the orthogonal complement of the range of L: Kernel (LL*) = 
(Range. L) 1 - . 

3. The map LL* is positive semi- definite. 

We include the proof of this proposition from Driessel [2004] for completeness. 

Proof: Here is the proof of the first assertion. It is obvious that Range(LL*) C Range. L. 

We want to see the other inclusion. Consider any element Lx in the range of L. Let 
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x = y + z where y G (Kernel. L) 1 - and z G Kernel. L. Since Range. L* = (Kernel. L) 1 - we 
have Lx = Ly is an element of Range(LL*). Here is the proof of the third assertion: 

(u,LL*v) = (L*u,L*v) = (L**L*u,v) = (LL*u,v) 

since L** = L. Finally we consider the second assertion. By the first and third assertions we 
have 

(Range.L) 1 - = (Range(LL*)) L = Kernel(LL*)* = Kernel(LL*). 

□ 

Driessel [2004] called the map LL* : W — > W the quasi-projection map associated 
with the least squares problem Lx ~ b. Driessel [2004] also compared the projection P 
with the quasi-projection LL* as follows: Since the restriction LL* : Range.L — > Range.L of 
LL* is self-adjoint, we can find a basis of Range.L consisting of eigenvectors: LL*Wi = AjWj 
for i = 1,2, ... ,m where m is the dimension of Range.L. For any w&W let w = r + s where 
r G Range.L and s G (Range.L)- 1 . We have Pw — r — T,(r,Wi)wi and LL*w = LL*r = 
T.(r,Wi)\iWi. Thus LL* is a projection followed by an eigenvalue-eigenvector scaling. (Also 
note that LL*P = PLL* = LL*.) Note that each \ is non-negative. It follows that 
the signature of P is the same as the signature of LL*. We regard congruence as the 
appropriate geometry for the study of quasi-projections. In summary, we regard the use of 
the quasi-projection operators LL* as simpler, more direct and more robust than the use of 
the projection operator P. 

We want to establish propositions like the last two for a pair of linear maps. Let U, V 
and W be finite-dimensional inner product spaces and let L : U — > W and M : V — > W be 
linear maps. We consider the following problem: 

Problem 2 (Projection) Given a vector c G W , find the vector c G W which is in the 
intersection Range.L n Range. M and is closest to c. 
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We can formulate this problem as a constrained optimization problem as follows. Let 

f.c := UxV^R:(x,y)^(l/2)((Lx-c,Lx-c) + (My-c,My-c)), 
k := U x V -> W : (x, y) i-> Lx - My. 

Problem 3 ( Optimization) Given c G W , find the pair inUxV which minimizes f.c(x, y) 
subject to the constraint k(x,y) = 0. 

If (u, v) is the solution of this optimization problem then c = Lu = Mv is the solution of 
the projection problem. 

We begin our analysis of the optimization problem by computing the derivative of the 
objective function f.c. We have 

D(f.c)(x,y)(dx,dy) = (Lx-c, L dx) + (My-c, M dy) = (L* (Lx - c) , dx) + (M* {My - c) , dy) . 

We use the standard Cartesian inner product on U x V; that is, for (x, y) and (x 1 , y') in 
U x V, we take ((x,y), (x',y')) := (x,x f ) + (y,y'). From the equation for the derivative of 
f.c, we easily recognize the gradient of f.c: 



Let (u, v) e U x V be the solution of the optimization problem. By the well-known 
Lagrange multiplier theorem, we have the condition 



Since k is linear, we have Dk(u, v) = k. Next we compute k*: For x G U, y G V , and z eW 
we have 



V(f.c)(x,y) = (L*(Lx - c),M*(My - c)). 



V(f.c)(u,v) G (Kernel(Dk(u,v))) 



Range(Dk(u, v))*. 



(k(x,y),z) 



(Lx - My, z) = (Lx, z) - {My, z) 



(x,L*z) - (y,M*z) = ((x,y),(L*z,-M*z)). 
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In other words, k* = W ^ U x V : z ^ (L*z,—M*z). Hence the Lagrange condition 
V(f.c)(u,v) G Range.k* is equivalent to the following one: 

3A G W, L*(Lu - c) = L*X and M*(Mv - c) = —M*X. 

Adding the constraint condition, we get the following system of (linear) equations for X,u 
and v: 

L*Lu-L*X = L*c, 
M*Mv • MX = M*c, 
Lu = Mv. 

We now assume that L and M are injective. Then L*L and M*M are invertible. We can 
apply (block) Gaussian elimination to the last system of equations; we get: 

u — L + X = L + c, 
r ■ M X = M <: 
(LL + + MM + )X = (—LL + + MM + )c 

where L + = (L*L) _1 L* and M + = (M*M)~ 1 M* are the Moore-Penrose pseudo-inverses of 
L and M respectively. This last set of equations implies the following set: 

Lu-PX = Pc, 
Mv + QX = Qc, 
(P + Q)X = {-P + Q)c 

where P := LL + and Q := MM + . Note that P and Q are the orthogonal projections of W 
onto Range. L and Range. M respectively. 

For any vector c in W, we are led to consider the following system of linear equations for 
w and A in W: 

w-PX = Pc, (pi) 
{P + Q)X = {-P + Q)c ( P 2) 
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(We get this set of equations from the preceding set by setting w := Lu = Mv and then 
omitting the redundant second equation.) Note that these are the quasi-projection equations 
determined by P,Q, and c. Since P and Q are positive semi-definite, the results in the section 
on quasi-projections apply. In particular, we have the following corollaries. 

Corollary 1 (Uniqueness) For any c in W, if (u>i, Ai) and {w 2l A 2 ) are solutions of the 
equations (pi) and (p2) then ui\ = W2, P\\ = P\ 2 and Q\i = Q\ 2 . 

Note that A is uniquely determined iff P + Q is surjective. 

Corollary 2 (Existence) For all c in W there exist w and A in W satisfying (pi) and 
( P 2). 

As in the section on quasi-projections, we use !(P, Q) to denote the linear operator on W 
which maps a vector c to the unique vector w which satisfies the following condition: There 
exists A in W such that the pair (w, A) satisfies equations (pi) and (p2). 

Corollary 3 (Quasi-Projection Formulas) The quasi-projection operator \(P,Q) satis- 
fies 

\(P, Q) = 2P{P + Q) + Q = 2Q{P + Q) + P. 
Furthermore, \(P,Q) is the ortho-projection ofW on Range. P D Range. Q. 

Proof: Claim: If c G Range. P fl Range. Q then !(P, Q)c = c. 

We have Pc = Qc = c. It follows that taking w := c and A := gives us a solution of 
(pi) and (p2). □ 



In summary we have the following analogue of the proposition concerning least squares 
approximation involving a single linear map. 

Proposition 9 Let U, V and W be inner product spaces and let L : U — > W and M : V — > 

W be injective linear maps. Let P := LL + and Q := MM + . Then the map \(P,Q) has the 
following properties: 
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1. The range of\(P, Q) equals the intersection of the ranges of L and M : Range.\(P, Q) = 
Range. L D Range. M. 

2. The kernel of\(P, Q) equals the orthogonal complement of the intersection of the ranges 
of L and M: Kernel.\(P,Q) = (Range. L n Range. M)^ . 

3. The map \(P,Q) is the projection of W onto Range. L n Range. M along (Range. L n 
Range. M) 1 - which corresponds to the decomposition 

W = (Range. L n Range. M) © (Range. L n Range. M)- 1 . 

We also want to establish an analogue of the proposition concerning the quasi-projection 
associated with a single linear map. We do so by setting A := LL* and B := MM*. We 
then consider the quasi-projection equations determined by c e W and these maps: 

w - A\ = Ac, (ql) 
(A + B)X — (B — A)c. (q2) 

Note that A and B are positive semi-definite. Hence the results in the section on quasi- 
projections apply. In particular, we have the following result. 

Proposition 10 Let U , V and W be inner product spaces and let L : U — > W and M : V — > 
W be linear maps. Let A := LL* and B := MM* . Then the map \(A, B) has the following 
properties: 

1. The range of\(A, B) equals the intersection of the ranges of L and M : Range.\(A, B) = 
Range. L D Range. M. 

2. The kernel of\(A, B) equals the orthogonal complement of the intersection of the ranges 
of L andM: Kernel A(A, B) = (Range. L n Range. M)^ . 

3. The map \(A,B) is positive semi- definite. 
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We regard the use of the quasi-projection operator \(A,B) as simpler, more direct and 
more robust than the use of the projection operator !(P, Q). In particular, we do not need to 
compute and (M*M) _1 when using \(A, B). The signature of \(A, B) was determined 

in the proof of the proposition concerning quasi-projection formulas in the section on quasi- 
projections. It is easy to see that \(A, B) and !(P, Q) have the same signature. 

B On the Toda flow. 

The flows that we describe are related to the QR algorithm and the Toda flow. For a square 
matrix X let X h X d and X u denote the strictly lower triangular, diagonal and strictly upper 
triangular part of X. The Toda flow or QR flow is the flow associated with the following 
differential equation: 

X' =[X,X l -Xf}. 

We use [X, Y] := XY — YX to denote the commutator of two square matrices. Note that 
Xi — Xj is skew-symmetric. Also note that if X is symmetric and K is skew-symmetric then 
[X, K] is symmetric. Hence we can (and shall) view the Toda flow as a dynamical system in 
the space of symmetric matrices. 

For symmetric matrix X, consider the following map determined by X: 

uj.X := 0(n) -> Sym{n) : Q ^ QXQ T . 

Note that the image of this map is the iso-spectral surface Iso(X). We differentiate uj.X to 
get the following linear map: 

D(u.X).I = Tan.O(n).I -> Tan.Sym(n).X : K ^ [K,X\. 

Recall that the space tangent to 0{n) at the identity I may be identified with the skew- 
symmetric matrices; in symbols, 

Tan.O{n).I = Skew{n) := {K e R nxn : K T = -K}. 
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(See, for example, Curtis [1984].) Clearly we can also identify Tan.Sym(n).X with Sym(n). 
We shall regard D(u.X).I as a linear map from Skew(n) to Sym(n). It is not hard to prove 
that the space tangent to Iso(X) at X is the image of the linear map D(u>.X).I; in symbols, 

Tan.Iso(X).X = {[K,X] : K G Skew(n)}. 

(For details see Warner [1983] chapter 3: Lie groups, section: homogeneous manifolds.) Since 
the vector field X i— > [X, X\ — Xf] of the Toda flow is tangent to Iso(X), it follows that the 
Toda flow is iso-spectral, that is, it preserves eigenvalues. It is well-known that the Toda flow 
is iso-spectral; for details, see, for example, Demmel [1997] Section 5.5: "Differential Equa- 
tions and Eigenvalue Problems" and the references there. The relationship between the Toda 
flow and the QR algorithm is also fairly well-known; again see, for example, Demmel [1997]. 

The QR algorithm and the Toda flow do have limited zero-preserving properties. We say 
that a symmetric pattern of interest A is a staircase pattern if A is "filled in toward the 
diagonal", that is, for all i < j, if is in A then so are — 1) and + Arbenz and 
Golub [1995] showed that the QR algorithm preserves symmetric staircase patterns and only 
such sparseness. Ashlock, Driessel and Hentzel [1997a] showed that the Toda flow preserves 
symmetric staircase patterns and only such sparseness. Here we aim to preserve arbitrary 
sparseness. 

Remark: For an earlier attempt to generalize the Toda flow to other zero-preserving flows, 
see Ashlock, Driessel and Hentzel [1997b]. This attempt had only very limited success. Chu 
and Norris [1988] designed flows on the symmetric matrices which converge to Sym(A) for 
various A's. In other words, given A and a symmetric matrix, their flows converge to a 
symmetric matrix with nonzero pattern A. We do not know if the zero-preserving properties 
of these flows have been studied. Driessel [2004] generalizes the Toda flow in a different way 
than we do here. 

We want to describe a geometrical explanation for the zero-preserving property of the 
Toda flow. This geometrical reason apparently is not well-known. (See, however, Symes [1980a, 
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1980b, 1982].) Let Upper (n) denote the group of invertible upper triangular matrices; in 
symbols, 

Upper (n) := {U E Gl(n) : i > j =>■ U(i,j) = 0} 

where Gl{n) denotes the group of invertible n x n matrices. Let upper (n) denote the linear 
space of upper triangular matrices; in symbols, 

upper [n) := {R G R nxn : i > j R(i,j) = 0}. 

Note that the space tangent to the matrix group of invertible upper triangular matrices at 
the identity may be identified with the space of upper triangular matrices; in symbols, 

Tan.Upper{n) .1 = upper (n). 

Note that the space of square matrices R nxn is the direct sum of the space of symmetric 
matrices and the space of strictly upper triangular matrices since 

X = X l + X d + X u = {X l + X d + Xf) + (X u -Xf). 

Let a : R nxn — > Sym(n) denote the corresponding projection; in symbols, 

a.X:=Xi + X d + Xl. 

We consider the following map: 

a := Upperin) x Sym(n) -> Sym(n) : (U,X) -> a{UXU- 1 ). 

Proposition 11 The mapping a is a group action. 

Proof: Note a(U u a(U 2 , X)) = a(XJ x U 2 , X) iff 

g{\J 1 g{\J 2 XU 2 1 )U{ 1 ) = a{U x U 2 XU^U^). 

Let Y be defined by a^^XU^ 1 ) + Y := U 2 XU 2 ^ 1 - Note that Y is strictly upper-triangular. 
Then 

U^a^XU^ 1 ))^ 1 + UiYUi 1 = U X U 2 XU 2 X U^. 
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Note a(U 1 YU 1 x ) = since U 1 YU 1 1 is strictly upper triangular. □ 



For any symmetric matrix X, we have the orbit of X under this action: 

Orbit(X) = a.Upper(n).X = {a{UXIJ- 1 ) : U G Upper(n)}. 



Consider the following map determined by X: 



[3.X := Upperin) -> Sym{n) : U i-> aiUXU' 1 ). 

Note that the image of this map is the orbit of X. We differentiate f3.X to get the following 
linear map: 

D(f3.X).I = Tan.Upper{n).I -> Tan.Sym{n).X : i? ^ a[i?,X]. 

As noted above we can identify Tan.Upper{n).I with upper(n). As before, we can also 
identify Tan.Sym(n).X with Sym(n). We shall regard D(j3.X).I as a linear map from 
upper {n) to Sym(n). It is not hard to prove that the space tangent to the orbit of X at X 
is the image of this linear map; in symbols, 

Tan.Orbit{X).X = {a[R,X\ : R e upper{n)}. 

(For details see Warner[1983] chapter 3: Lie groups, section: homogeneous manifolds.) 

Let T denote the tridiagonal symmetric matrix determined by the triple (1,0,1); in 
symbols, 

(o 1 ... 
1 1 ... 
1 



T 



A 







... 1 

o o o ... i oy 

We find the following result rather surprising. (In particular, we do not know the historical 
origin of this result.) 
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Proposition 12 The tridiagonal symmetric matrices with trace equal zero and nonzero sub- 
diagonal (and super- diagonal) entries are the orbit of the matrix T under the action by the 
group Upper (n). 



(The tridiagonal matrices with nonzero sub-diagonal and super-diagonal are often called 
Jacobi matrices.) 

Proof: Note that every element of Upper (n) can be written as the product of an invertible 
diagonal matrix and an element of Upper (n) with only ones on the diagonal. We sketch the 
rest of the proof when n = 3; it should be clear how to generalize these calculations. We use 
* to denote irrelevant entries in matrices. We have 



and 



(l h *\ [o 1 oWi -h 



1 l 2 

ij x 

ll * * 

1 12 — l\ * 

1 -l 2 



1 1 
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di 
d 2 
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Note that for any symmetric matrix X, we have 

[X,X l -X?] = -[X,X d + 2Xf] 

since 

0=[X,X] = [X,X l + X d + X?] 
= [X,X l -X?] + iX,X d + 2X?]. 

Thus we see that the Toda flow can be rewritten as the following "differential algebraic " 
initial value problem: 

x'^ix.x-xfl X(0) = A 

[X,X l -X?] = -[X,X d + 2X?]. 

From the second equation (which holds trivially) we see that not only does the solution stay 
on the iso-spectral surface, but it also stays on the orbit of A under the action by the upper 
triangular group. Thus if A is a tridiagonal matrix with trace zero and X(t) is the solution 
of the differential equation at time t then X(t) is tridiagonal and has trace zero. It is easy 
to see that the zero trace condition can be replaced by a constant trace condition. 

It is not hard to see that these observations concerning symmetric tridiagonal matrices 
generalize to any symmetric staircase pattern of interest. The zero-preserving iso-spectral 
flow that we derive in the main part of this report can be viewed as a differential algebraic 
equation similar to the one we have here. 

The Toda flow is also related to an optimization problem closely related to the one we 
mentioned at the end of the introduction. As there, let D be a symmetric matrix and let 
/ := Sym(n) — > R : X i— > (1/2) (X — D,X — D) be an "objective function". Consider the 
following optimization problem: 

Problem 4 Given a symmetric matrix A, minimize f(X) subject to the constraint that X 
is in Iso(A). 
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This problem is analyzed in Chu and Driessel [1990]. (See also Driessel [2004].) 

Computing the derivative of / we get that, for any symmetric matrices X and H, 
DfX.H — (X — D, H). For the gradient of / at X, we then have Vf.X = X - D. We can 
get an iso-spectral vector field by orthogonal projection as follows. Let l.X := D{u.X).I. 
Recall that, for all skew-symmetric K, D(lu.X).I.K = [K,X]. Note that the adjoint (l.X)* 
of l.X is the following map: 

(l.X)* = Sym(n) -> Skew(n) : Y i-> [Y,X] 

since, for every symmetric matrix Y and every skew-symmetric matrix K, ([K, X],Y) = 
(K,[Y,X]). If l.X is injective then the projection onto Tan.Iso(X).X is the operator 
(l.X)({l.X)(l.Xy)~ l (l.X)* . Instead of using this orthogonal projection, we simply use the 
map (l.X)(l.X)*; in other words, we drop the factor involving the inverse from the projection 
formula. (For more on this matter see Driessel [2004].) We can also then drop the require- 
ment that l.X be injective. Note that (l.X)(l.X)*Y = [[Y,X],X]. In particular, we have 
(l.X)(l.X)*(-V.f.X) = [[D-X,X],X] = [[D, X],X\. We use a "quasi-projection" similar 
to this one in order to derive our zero-preserving iso-spectral flow. 

The double-bracket flow is the flow associated with the following differential equation: 

X'=[[D,X],X}. 

Proposition 13 The double-bracket flow has the following properties: 

1. This flow preserves eigenvalues. 

2. The objective function f is non-increasing along solutions of this flow. 

3. A symmetric matrix is an equilibrium point of this flow iff it commutes with D. 

4- Let D be the diagonal matrix with diagonal entries l,2,...,n. Then, on the space of 
tridiagonal symmetric matrices, this flow coincides with the Toda flow. 
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Proof: For any solution X(t) of the double bracket differential equation, we have 



(f.X)' = (X — D, X') = (X — D, [[D, X],X]) 

= ([X- D, X], [D,X]) = —{[D, X], [D,X]) < 0. 

This inequality shows the / is non-increasing along solutions of this flow. We leave the rest 
of the proof to the reader. □ 
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